124 research outputs found

    Further improvements of Steiner tree approximations

    No full text
    The Steiner tree problem requires to find a shortest tree connecting a given set of terminal points in a metric space. We suggest a better and fast heuristic for the Steiner problem in graphs and in rectilinear plane. This heuristic finds a Steiner tree at most 1.757 and 1.267 times longer than the optimal solution in graphs and rectilinear plane, respectively

    Area fill synthesis for uniform layout density

    Full text link

    Filling algorithms and analyses for layout density control

    Full text link

    Efficient error correction and haplotypes reconstruction for deep sequencing of hepatitis c amplicons

    Get PDF
    Секция 1. Защита информации и компьютерный анализ данныхWe present two new highly efficient pyrosequencing error correction algorithms: (i) k-mer – based error correction (KEC); and (ii) empirical frequency threshold (ET). Both were compared to the recently published clustering algorithm SHORAH to evaluate the relative performance using 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. We found that all three algorithms showed similar performance in terms of finding true haplotypes, but KEC and ET methods significantly outperformed SHORAH both in terms of their ability to remove false haplotypes and to estimate the frequency of true ones

    A PTAS for planar group Steiner tree via spanner bootstrapping and prize collecting

    Get PDF
    We present the first polynomial-time approximation scheme (PTAS), i.e., (1 + ϵ)-approximation algorithm for any constant ϵ > 0, for the planar group Steiner tree problem (in which each group lies on a boundary of a face). This result improves on the best previous approximation factor of O(logn(loglogn)O(1)). We achieve this result via a novel and powerful technique called spanner bootstrapping, which allows one to bootstrap from a superconstant approximation factor (even superpolynomial in the input size) all the way down to a PTAS. This is in contrast with the popular existing approach for planar PTASs of constructing lightweight spanners in one iteration, which notably requires a constant-factor approximate solution to start from. Spanner bootstrapping removes one of the main barriers for designing PTASs for problems which have no known constant-factor approximation (even on planar graphs), and thus can be used to obtain PTASs for several difficult-to-approximate problems. Our second major contribution required for the planar group Steiner tree PTAS is a spanner construction, which reduces the graph to have total weight within a factor of the optimal solution while approximately preserving the optimal solution. This is particularly challenging because group Steiner tree requires deciding which terminal in each group to connect by the tree, making it much harder than recent previous approaches to construct spanners for planar TSP by Klein [SIAM J. Computing 2008], subset TSP by Klein [STOC 2006], Steiner tree by Borradaile, Klein, and Mathieu [ACM Trans. Algorithms 2009], and Steiner forest by Bateni, Hajiaghayi, and Marx [J. ACM 2011] (and its improvement to an efficient PTAS by Eisenstat, Klein, and Mathieu [SODA 2012]. The main conceptual contribution here is realizing that selecting which terminals may be relevant is essentially a complicated prize-collecting process: we have to carefully weigh the cost and benefits of reaching or avoiding certain terminals in the spanner. Via a sequence of involved prize-collecting procedures, we can construct a spanner that reaches a set of terminals that is sufficient for an almost-optimal solution. Our PTAS for planar group Steiner tree implies the first PTAS for geometric Euclidean group Steiner tree with obstacles, as well as a (2 + ϵ)-approximation algorithm for group TSP with obstacles, improving over the best previous constant-factor approximation algorithms. By contrast, we show that planar group Steiner forest, a slight generalization of planar group Steiner tree, is APX-hard on planar graphs of treewidth 3, even if the groups are pairwise disjoint and every group is a vertex or an edge

    Estimation of alternative splicing isoform frequencies from RNA-Seq data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel whole transcriptome sequencing, commonly referred as RNA-Seq, is quickly becoming the technology of choice for gene expression profiling. However, due to the short read length delivered by current sequencing technologies, estimation of expression levels for alternative splicing gene isoforms remains challenging.</p> <p>Results</p> <p>In this paper we present a novel expectation-maximization algorithm for inference of isoform- and gene-specific expression levels from RNA-Seq data. Our algorithm, referred to as IsoEM, is based on disambiguating information provided by the distribution of insert sizes generated during sequencing library preparation, and takes advantage of base quality scores, strand and read pairing information when available. The open source Java implementation of IsoEM is freely available at <url>http://dna.engr.uconn.edu/software/IsoEM/</url>.</p> <p>Conclusions</p> <p>Empirical experiments on both synthetic and real RNA-Seq datasets show that IsoEM has scalable running time and outperforms existing methods of isoform and gene expression level estimation. Simulation experiments confirm previous findings that, for a fixed sequencing cost, using reads longer than 25-36 bases does not necessarily lead to better accuracy for estimating expression levels of annotated isoforms and genes.</p

    Efficient error correction for next-generation sequencing of viral amplicons

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing.</p> <p>Results</p> <p>In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones.</p> <p>Conclusions</p> <p>Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.</p> <p>The implementations of the algorithms and data sets used for their testing are available at: <url>http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm</url></p

    Inferring viral quasispecies spectra from 454 pyrosequencing reads

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences.</p> <p>Results</p> <p>In this paper, we introduce a new <b>Vi</b>ral <b>Sp</b>ectrum <b>A</b>ssembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>.</p> <p>Conclusions</p> <p>ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p
    corecore